When the Rain Stops: How Do Droughts
Influence Bee Mortality?
Cedric Bilger, Hoang Trang
2025-01-31
1 Introduction
Bees are essential for maintaining ecosystems and supporting the
health of the biosphere through their role in pollination. Bees are key
pollinators for many fruits, vegetables, nuts, and seeds. Approximately
one-third of the human food supply depends on insect pollination,
predominantly by bees (Syngenta 2021).
This underscores why the European honey bee is considered the most
economically important pollinator for agricultural crops worldwide (Conte and Navajas 2008).
In recent
years, however, bee mortality and the decline of bee colonies have
raised serious concerns. This alarming trend is driven by a variety of
factors, with climate change emerging as a significant threat. Climate
change introduces multiple stressors, including rising temperatures and
an increasing frequency of droughts (Rankin,
Barney, and Lozano 2020).
To better understand the
impact of climate change on bee populations, it is crucial to examine
the relationship between environmental stressors and bee mortality
rates. Among these stressors, droughts may play a significant role due
to their potential direct and indirect impacts on floral resources and
habitat availability. In the following sections, the data is presented
and analyzed to explore the connection between droughts and bee
mortality.
2 Data Overview
2.1 Understanding the
Drought Dataset
This section delves into the characteristics of the drought data,
including its sources, key variables, and temporal coverage.
The dataset is sourced from the TidyTuesday
repository and was originally compiled by the National Integrated
Drought Information System (NIDIS).
2.1.1 Dataset
Structure
The drought data consists of two related datasets:
1. drought.csv: State-level data on
drought and wetness conditions over time. 2.
drought_fips.csv: County-level data on
drought conditions using FIPS codes for localized analysis.
These datasets provide detailed information on drought frequency and
intensity across different regions and time periods within the United
States. The data points are taken daily over a span from the year 1895
to the present.
2.1.2 Insights in the
drought.csv dataset:
Key variables include DATE, which records the date of
observation, and severity levels such as D0 (abnormally
dry), D1 (moderate drought), D2 (severe
drought), D3 (extreme drought), and D4
(exceptional drought). Similarly, W0 to W4
track wetness levels. with W0 representing the least wet
conditions and W4 indicating the wettest conditions. The
state variable indicates the location.
To provide a closer look, the following table shows a preview of the
data:
Preview of the drought dataset (drought.csv)
0
DATE
D0
D1
D2
D3
D4
-9
W0
W1
W2
W3
W4
state
59.4
d_19780401
14.6
3.5
0.1
0.0
0.0
0
26.0
20.7
8.3
0.1
0.0
alabama
38.9
d_19780501
0.2
0.0
0.0
0.0
0.0
0
60.9
29.9
12.3
4.3
0.0
alabama
70.0
d_19780601
0.9
0.0
0.0
0.0
0.0
0
29.0
22.5
10.2
3.9
0.5
alabama
41.0
d_19780701
34.1
18.9
2.5
0.0
0.0
0
25.0
21.0
9.5
3.6
1.5
alabama
18.3
d_19780801
57.4
48.0
24.0
10.2
1.4
0
24.3
17.2
4.5
2.3
1.1
alabama
26.3
d_19780901
54.1
40.5
15.3
5.0
0.0
0
19.6
12.6
2.4
1.3
0.0
alabama
23.2
d_19781001
75.1
64.7
36.9
18.9
5.2
0
1.6
0.2
0.0
0.0
0.0
alabama
36.4
d_19781101
59.0
42.7
13.9
3.4
0.0
0
4.7
0.9
0.0
0.0
0.0
alabama
58.6
d_19781201
34.4
13.6
0.5
0.0
0.0
0
7.0
1.6
0.0
0.0
0.0
alabama
69.4
d_19790101
5.1
0.7
0.0
0.0
0.0
0
25.5
8.7
0.5
0.0
0.0
alabama
62.4
d_19790201
25.0
7.0
0.1
0.0
0.0
0
12.5
4.0
0.1
0.0
0.0
alabama
We can see that, for example, in the fourth row of the table,
Alabama experienced significant drought conditions on July 1, 1978. On
this date, 34.1% of the state was categorized as abnormally dry
(D0), 18.9% as moderate drought (D1), and 2.5%
as severe drought (D2). No areas were recorded under
extreme drought (D3) or exceptional drought
(D4).
On the wetness side, 25.0% of the state was under least wet
conditions (W0), 21.0% under moderately wet conditions
(W1), 9.5% under very wet conditions (W2), and
3.6% under extremely wet conditions (W3). A small portion
(1.5%) of the state fell into the wettest conditions (W4)
category, indicating the highest levels of wetness during this time.
The attentive reader may have noticed the inclusion of the
0 and -9 columns in the dataset. These columns
serve an important purpose in interpreting the data. The 0
column indicates the total percentage of the state experiencing any form
of drought conditions. For example, in this case, it shows 41.0% for
Alabama on July 1, 1978, representing the combined percentage of the
state categorized under D0 through D4.
On the other hand, the -9 column captures the percentage
of the state for which no data was available. This could include regions
not monitored or excluded from the analysis. For this particular date
and location, the -9 value is 0%, meaning that complete
data was recorded for Alabama.
2.1.3 Insights in the
drought_fips.csv dataset:
In addition to the main drought dataset,
drought.csv, the
drought_fips.csv dataset provides detailed
information on drought conditions categorized by the Federal Information
Processing Standards (FIPS) code, a unique identifier for geographical
regions within the United States. This additional data allows for a more
granular analysis of drought conditions at the county or region level,
helping to track localized drought trends over time.
To illustrate this further, the following table shows a preview of the
drought_fips.csv dataset:
Preview of the drought-fips dataset (drought_fips.csv)
State
FIPS
DSCI
date
AL
01003
100
2006-11-14
AL
01003
87
2006-11-21
AL
01003
80
2006-11-28
AL
01003
100
2006-12-05
AL
01003
98
2006-12-12
AL
01003
99
2006-12-19
For instance, looking at the second row of the table, the
FIPS code 01003 corresponds to Baldwin County in Alabama,
as confirmed by the state column showing AL. The DSCI
(Drought Severity and Coverage Index) is a measure used to assess the
severity and extent of drought conditions, with values ranging from 0
(no drought) to 500 (extreme drought). On November 21, 2006, the DSCI
value for Baldwin County is recorded as 87, indicating a moderately
severe drought in this area.
2.2 Understanding the Bee
Mortality Dataset
This section explores the characteristics of the bee mortality
dataset, including its sources, key variables, and structure.
The dataset is sourced from the TidyTuesday
repository and provides insights into colony losses and stressors
affecting honeybee populations across the United States over
time.
2.2.1 Dataset
Structure
The bee mortality data consists of two related datasets:
1. colony.csv: Contains data on the number
of bee colonies, their losses, and related statistics.
2. stressor.csv: Focuses on stressors
affecting the colonies, including pests, diseases, and environmental
factors.
Both datasets covers data from 2015 to 2021, with all U.S. states
included.
2.2.2 Insights in the
colony.csv dataset:
The colony.csv dataset contains several
key variables that provide detailed insights into bee colony dynamics
over time. The year variable indicates the year of
observation, while months specifies the time period within the year,
such as January to March. The state variable identifies the
U.S. state being observed, allowing for geographic analysis of bee
colonies.
Data on colony numbers is captured in several fields:
colony_n records the number of colonies at the start of the
period, and colony_max represents the maximum number of
colonies during the same timeframe. The dataset also tracks losses and
additions to colonies, with colony_lost detailing the
number of colonies lost and colony_lost_pct expressing this
loss as a percentage. Similarly, colony_added reflects the
number of colonies added during the period.
Efforts to improve
or maintain colonies are represented by colony_reno, which
shows the number of colonies renovated, and
colony_reno_pct, which provides the percentage of colonies
that underwent renovation.
To better understand the data, the following table provides a preview of
the colony.csv dataset:
Preview of the bee mortality dataset (colony.csv)
year
months
state
colony_n
colony_max
colony_lost
colony_lost_pct
colony_added
colony_reno
colony_reno_pct
2015
January-March
Alabama
7000
7000
1800
26
2800
250
4
2015
January-March
Arizona
35000
35000
4600
13
3400
2100
6
2015
January-March
Arkansas
13000
14000
1500
11
1200
90
1
2015
January-March
California
1440000
1690000
255000
15
250000
124000
7
2015
January-March
Colorado
3500
12500
1500
12
200
140
1
2015
January-March
Connecticut
3900
3900
870
22
290
NA
NA
In the third row of
the table, the data pertains to Arkansas during the period of
January–March 2015. The colony_n value shows that the state
started with 13,000 colonies, which increased to a maximum of 14,000
(colony_max). However, 1,500 colonies were lost during this
time (colony_lost), accounting for an 11% loss
(colony_lost_pct). Arkansas added 1,200 colonies
(colony_added) and renovated 90 colonies
(colony_reno), representing 1% of colonies renovated
(colony_reno_pct).
2.2.3 Insights in the
stressor.csv dataset:
The stressor.csv dataset captures key
information about the factors affecting bee colonies across the United
States. It includes the variable year, representing the
year of observation, and the variable months, specifying
the time period within that year. Observations are categorized by
state, providing insights into regional differences in the
challenges faced by bee populations. The dataset also identifies
specific stressor types, such as Varroa mites or
pesticides, and quantifies their impact through the
stress_pct variable, which represents the percentage of
colonies affected.
The following table provides a preview of the
stressor.csv dataset:
Preview of the stressor dataset (stressor.csv)
year
months
state
stressor
stress_pct
2015
January-March
Alabama
Varroa mites
10.0
2015
January-March
Alabama
Other pests/parasites
5.4
2015
January-March
Alabama
Diseases
NA
2015
January-March
Alabama
Pesticides
2.2
2015
January-March
Alabama
Other
9.1
2015
January-March
Alabama
Unknown
9.4
In the first row of
the table, the data pertains to Alabama during January–March 2015. The
stressor is “Varroa mites,” which impacted 10.0% (stress_pct) of the bee
colonies during this time. Varroa mites are a significant stressor known
to harm bee health and contribute to colony losses.
3 Our Hypothesis
Drought conditions negatively impact the availability of forage
resources, such as flowers, for bees, thereby reducing their food supply
and increasing stress on bee populations. As a result, we expect that
areas experiencing more severe drought conditions (as indicated by
higher DSCI values) will show higher percentages of stressed colonies or
even lost colonies.
However, this expected correlation is also
influenced by the region and season. In some regions, the impact of
drought on bee colonies will be more severe due to factors like local
climate, typical drought frequency, and forage availability.
3.1 Important
variables
To test our hypothesis, we rely on several key variables from the
datasets:
D0 - D4 and W0 -
W4: From the drought.csv
dataset These variables represent drought severity (D0
to D4) and wetness levels (W0 to
W4), which allow us to assess the impact of drought and
moisture conditions on bee health.
colony_lost: From the
colony.csv dataset This variable
indicates the number of colonies lost during a given period, which is
crucial for understanding the relationship between drought conditions
and colony stress.
stress_pct: From the
stressor.csv dataset This variable
shows the percentage of colonies affected by various stressors, such as
drought or pests, helping us measure the stress on bee
populations.
state and
date/months/year: From both the
drought.csv and
stressor.csv datasets These variables
provide important regional and temporal context, enabling us to explore
seasonal and regional variations in drought impact on bee
colonies.
4 Preprocessing
In this chapter, we focus on preparing the datasets introduced in
Chapter Data Overview for
subsequent analysis. First, variables are standardized to ensure
consistent naming conventions and formats. Next, the data is filtered to
align with the common time period. Following this, relevant datasets are
merged and finally, any missing data is addressed.
4.1 Standardizing
Variable Names and Formats
The drought.csv dataset initially
stored dates in a single variable, formatted as
d_year|month|day while the
drought_fips.csv dataset stores year,
month and day as separate variables. Therefore this format was
restructured into three separate variables: year,
month, and day.
Here is a glimpse of
the transformed drought.csv dataset:
Preview of the transformed drought dataset
0
DATE
D0
D1
D2
D3
D4
-9
W0
W1
W2
W3
W4
state
year
month
day
0
d_18950101
0
0
0
0
0
100
0
0
0
0
0
alabama
1895
January
01
In the drought_fips dataset, months are represented numerically, and
state names are abbreviated. To ensure consistency, the data has been
transformed to use full month names and state names written in full.
Below is a preview of the transformed dataset:
Preview of the transformed drought dataset
state
FIPS
DSCI
year
month
day
alaska
02013
0
2000
January
04
Now, all shared variables across the
drought.csv,
drought_fips.csv, and
colony.csv datasets have been standardized
to follow a consistent naming convention and format. This
standardization ensures compatibility, allowing the datasets to be
merged seamlessly for comprehensive analysis.
4.2 Connecting Datasets:
Temporal Overlap
To ensure the datasets are compatible for analysis, it is important
to align their time periods. The
colony.csv dataset spans the years 2015 to
2021, while the drought.csv and
drought_fips.csv datasets cover a broader
time range. This filtering step ensures that all observations align
temporally, therefore removing all data outside this 2015 to 2021 range.
Summary of the drought Dataset Before and After Filtering
Stage
Rows
Date_Range
Before Filtering
73344
1895 - 2022
After Filtering
3792
2015 - 2021
Summary of the drought_fips Dataset Before and After Filtering
Stage
Rows
Date_Range
Before Filtering
3771791
2000 - 2022
After Filtering
1104803
2015 - 2021
4.3 Combining the
data
Now that all variable names are standardized and the time spans of
the datasets are aligned, they can be seamlessly combined into a single
dataset. However, there is a difference in the level of detail in the
time data across the datasets. The
drought_fips.csv dataset includes multiple
data points within each month, whereas the
drought.csv dataset records data only on
the first day of each month.
Upon further investigation, it was
determined that the drought dataset is based on weekly updates, as
detailed on the U.S.
Drought Monitor website. The data recorded on the first day of each
month appears to represent an aggregated value for the preceding
month.
To facilitate merging the drought datasets, the entries
in the drought_fips.csv dataset are
averaged for each month and year. The resulting dataset sets the
day variable to “01” for each entry, aligning with the
temporal structure of the drought data.
Here is a side-by-side comparison to illustrate the changes made to the
dataset:
Preview of the drought fips dataset before averaging
state
FIPS
DSCI
year
month
day
alaska
02013
0
2015
January
06
alaska
02013
0
2015
January
13
alaska
02013
0
2015
January
20
alaska
02013
0
2015
January
27
alaska
02013
0
2015
February
03
Preview of the drought fips dataset after averaging
state
FIPS
avg_DSCI
year
month
day
alabama
01001
50.0
2015
April
01
alabama
01001
4.5
2015
August
01
alabama
01001
0.0
2015
December
01
alabama
01001
100.0
2015
February
01
alabama
01001
50.0
2015
January
01
The drought-related datasets are now combined into a single, unified
dataset:
A preview of the combined dataset is shown below:
Preview of the combined drought dataset
0
DATE
D0
D1
D2
D3
D4
-9
W0
W1
W2
W3
W4
state
year
month
day
FIPS
avg_DSCI
49.5
d_20150101
40.6
21.5
1.1
0.1
0
0
9.8
3.8
0.1
0
0
alabama
2015
January
01
01001
50.0
49.5
d_20150101
40.6
21.5
1.1
0.1
0
0
9.8
3.8
0.1
0
0
alabama
2015
January
01
01003
157.5
49.5
d_20150101
40.6
21.5
1.1
0.1
0
0
9.8
3.8
0.1
0
0
alabama
2015
January
01
01005
47.0
49.5
d_20150101
40.6
21.5
1.1
0.1
0
0
9.8
3.8
0.1
0
0
alabama
2015
January
01
01007
0.0
49.5
d_20150101
40.6
21.5
1.1
0.1
0
0
9.8
3.8
0.1
0
0
alabama
2015
January
01
01009
0.0
To align the combined dataset with the
colony.csv dataset, it is necessary to
aggregate the data by quarters, as the
colony.csv dataset contains values for the
following periods: January-March, April-June, July-September, and
October-December. Consequently, the variables
D0-D4, -9,
W0-W4, and the previously averaged
avg_DSCI must be averaged across the respective periods.
The following provides a preview:
Preview of the averaged combined drought dataset
state
year
months
avg_D0
avg_D1
avg_D2
avg_D3
avg_minus9
avg_W0
avg_W1
avg_W2
avg_W3
avg_W4
avg_DSCI
alabama
2015
April-June
20.70000
6.7000000
0.0666667
0.0000000
0
16.133333
3.400000
0.1333333
0.00000
0.000000
36.84080
alabama
2015
January-March
53.86667
40.1666667
18.6666667
7.1666667
0
8.433333
3.566667
0.3666667
0.00000
0.000000
48.83557
alabama
2015
July-September
34.03333
10.6000000
0.5333333
0.0333333
0
6.333333
1.500000
0.0000000
0.00000
0.000000
57.57786
alabama
2015
October-December
12.50000
3.5000000
0.1666667
0.0000000
0
39.700000
29.633333
17.1000000
10.43333
5.600000
29.27587
alabama
2016
April-June
1.00000
0.1666667
0.0000000
0.0000000
0
75.566667
59.766667
28.6000000
14.36667
3.033333
56.16891
Finally, all relevant datasets are consolidated into a single unified
dataset.
Below is a preview of its current structure:
4.4 Missing data
Before delving into the analysis, it is crucial to evaluate data
completeness. Missing values can affect both accuracy and reliability,
making it important to first determine how many rows contain ‘NA’ values
across the datasets.
## [1] 613
A total of 613 missing values are present, distributed as shown in
the table below.
The following displays a line with missing data in it. in this example
the variable colony_reno_pct is missing.
After identifying the missing values, the next step is to remove the
rows containing any missing data. To ensure that no missing values
remain in the dataset, the total count of missing values is
recalculated, which should yield zero.
## [1] 0
5 Exploring the
datasets
First, the data on bee colony mortality is examined. The following
visualization illustrates colony loss percentages across U.S. states
over multiple years, capturing both temporal and spatial trends. A color
gradient is used, with blue representing low loss rates and yellow
indicating high loss rates. Most states show colony loss percentages
between 10% and 20%, indicating relatively stable losses rather than
drastic changes. Only in a few states do the values rise significantly,
reaching higher percentages. These exceptions might indicate regions
where colony loss could driven by specific agricultural factors.
The below graph provides additional insight into the percentage of
colony loss across various U.S. states from 2015 to 2021. In this graph,
we can clearly see that most states experience colony loss percentages
between 10% and 20%, with only few stages showing occasional spikes
above this range.
The Drought data can be visualized in a similar manner, using the
Drought Severity and Coverage Index (DSCI) as a measure of drought
intensity and extent. The DSCI ranges from 0 (indicating no drought) to
500 (representing extreme drought conditions). The visualization clearly
highlights a regional pattern, with significantly more severe drought
conditions in the western United States compared to the eastern
regions.
Here, the DSCI is
depicted across all years for all states.
Below, interactive buttons provide access to state-specific
visualizations, allowing for a detailed examination of the relationship
between colony loss percentages and drought conditions. Clicking on a
button reveals two plots for the selected state. The first plot
illustrates the trend of colony loss percentages alongside the Drought
Severity and Coverage Index (DSCI) over time, offering insights into
overall drought severity. The second plot presents colony loss
percentages in relation to the drought levels D0-D3, enabling a more
granular comparison of different drought intensities. Feel free to
explore the data by selecting different states to uncover potential
patterns and regional variations in bee colony losses and drought
conditions.
6 Analysis
Looking at the data, there does not seen to be a clear correlation
between drought severity and bee colony losses. In certain cases, such
as in Maine from 2016 to 2018, an increase in drought severity appears
to correspond with a rise in colony loss percentage. However, this
pattern is not consistently observed across all states and time periods,
suggesting that additional factors may influence colony losses beyond
drought conditions alone.
For example, in Mississippi from 2016 to 2018, there appears to be no
noticeable effect of drought on colony losses. Despite a significant
peak in drought severity during this period, colony loss percentages
remain relatively stable.
In some cases, such as in Illinois, colony losses even exhibit
seemingly random fluctuations with no clear connection to drought
levels.
This indicates that colony loss is likely driven by multiple factors
beyond just drought severity.
This is further reflected in the correlation coefficients between the
percentage of colonies lost and drought levels (D0-D3). A selection of
these coefficients for various states is presented in the table
below.
Pearson Correlation Between Colony Loss and Drought Levels (D0-D3)
State
D0
D1
D2
D3
Alabama
-0.023
0.007
0.031
0.024
Arizona
0.275
0.234
0.144
0.11
Arkansas
0.113
0.128
0.095
0.067
California
-0.266
-0.319
-0.315
-0.301
Colorado
0.138
0.138
0.16
0.16
Connecticut
0.184
0.085
0.048
0.001
Florida
-0.302
-0.313
-0.382
-0.382
Georgia
-0.241
-0.264
-0.282
-0.277
Maine
0.1
0.146
0.158
0.182
Mississippi
-0.359
-0.407
-0.365
-0.327
Illinois
0.236
0.188
0.094
0.183
The correlation coefficient (r) is a statistical measure that
quantifies the strength and direction of the relationship between two
variables. A value of r = 1 indicates a perfect positive correlation,
meaning that as one variable increases, the other also increases in a
perfectly linear manner. Conversely, r = -1 represents a perfect
negative correlation, where an increase in one variable corresponds to a
decrease in the other. A value of r = 0 suggests no linear relationship
between the variables. Based on the correlation values presented, there
appears to be little to no consistent relationship between drought
severity and bee colony mortality.
7 Conclusion
As mentioned above, our hypothesis suggested a link between drought
severity and colony losses with the expectation that higher DSCI values
would correspond to increased colony loss percentages. But based on our
analysis, while drought conditions may contribute to colony stress in
certain cases, they do not consistently drive colony losses on their
own. The results suggest drought alone is not a reliable predictor of
colony loss. Instead, multiple factors likely interact to influence bee
population declines, underscoring the complexity of this issue and
highlighting the need for a multifactorial approach in future research
and conservation efforts.
Rankin, Erin E. Wilson, Sarah K. Barney, and Giselle E. Lozano. 2020.
“Reduced Water Negatively Impacts Social Bee Survival and
Productivity via Shifts in Floral Nutrition.”Journal of
Insect Science 20 (5): 15. https://doi.org/10.1093/jisesa/ieaa114.